Semantic Text Segmentation and Sub-topic Extraction

نویسندگان

  • Anshu Jain
  • Arati Kadav
  • Jaya Kawale
چکیده

Semantic Text segmentation and sub-topic extraction divides the input text into coherent paragraphs and extracts topics out of them. This enables applications to extract relevant meaningful data that could be useful in many text analysis tasks like information retrieval and summarization. In this project we have combined the techniques of text tiling and latent semantic analysis and have come up with a standalone tool that segments documents and presents the sub-topics. The advantage of using these techniques is that we are not dependent upon any knowledge base. We have tested our tool on various combined articles from google news, stories and long articles and it gave us good results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Topic Segmentation of Texts based on Semantic Domains

1 LIMSI-CNRS. BP 133, 91403 Orsay Cedex, France. email: [ferret,grau]@limsi.fr Abstract. Thematic analysis is essential for many Natural Language Processing (NLP) applications, such as text summarization or information extraction. It is a two-dimensional process that has both to delimit the thematic segments of a text and to identify the topic of each of them. The system we present possesses th...

متن کامل

Prosody Modeling for Automatic Speech Recognition and Understanding

This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automati...

متن کامل

Prosody and topic structuring in spoken dialogue

Prosody is critical in conveying topic coherence and the salience of information in speech. In this study we propose that the overall coherence is brought about through pitch level structuring of phrases at both the local level of hierarchical phrase unit positioning and the global level of pitch baseline rise and fall as climax and resolution. Our results show that prosody has critical importa...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004